A Document Frequency Constraint for Pseudo-Relevance Feedback Models

نویسندگان

  • Stéphane Clinchant
  • Éric Gaussier
چکیده

RÉSUMÉ. Nous étudions dans cet article le comportement de plusieurs modèles de rétropertinence en mettant en avant leurs principales caractéristiques. Ceci nous conduit à introduire une nouvelle contrainte pour les modèles de rétro-pertinence, contrainte liée à la fréquence documentaire (DF) des mots. Nous analysons ensuite, d’un point de vue théorique, différents modèles de rétro-pertinence par rapport à cette contrainte. Cette analyse montre que le modèle de mélange utilisé en rétro-pertinence pour les modèles de langue ne satisfait pas cette contrainte. Nous réalisons ensuite une série d’expériences qui permettent de valider la contrainte DF. Pour cela, nous utilisons tout d’abord un oracle sur la base de documents pertinents, puis utilisons une famile de fonctons de type tf-idf, mais paramétrée de telle sorte que des individus différents de la famille auront des comportements différents par rapport à la contrainte DF. Ces expériences montrent la validité et l’importance de la contrainte DF.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Iterative Estimation of Document Relevance Score for Pseudo-Relevance Feedback

Pseudo-relevance feedback (PRF) is an effective technique for improving the retrieval performance through updating the query model using the top retrieved documents. Previous work shows that estimating the effectiveness of feedback documents can substantially affect the PRF performance. Following the recent studies on theoretical analysis of PRF models, in this paper, we introduce a new constra...

متن کامل

Modèles probabilistes pour les fréquences de mots et la recherche d'information. (Probabilistic Models of Document Collections)

The present study deals with word frequencies distributions and their relation to probabilistic Information Retrieval (IR) models. We examine the burstiness phenomenon (a rich get richer phenomenon) of word frequencies in textual collections. We propose to model this phenomenon as a property of probability distributions and we show that the Beta Negative Binomial distribution is a good statisti...

متن کامل

Mining Specific and General Features in Both Positive and Negative Relevance Feedback: QUT E-Discovery Lab at the TREC 2010 Relevance Feedback Track

User relevance feedback is usually utilized by Web systems to interpret user information needs and retrieve effective results for users. However, how to discover useful knowledge in user relevance feedback and how to wisely use the discovered knowledge are two critical problems. However, understanding what makes an individual document good or bad for feedback can lead to the solution of the pre...

متن کامل

Improving the Robustness of Relevance-Based Language Models

We propose a new robust relevance model that can be applied to both pseudo feedback and true relevance feedback in the language-modeling framework for document retrieval. There are three main differences between our new relevance model and the Lavrenko-Croft relevance model. First, a query is treated as a short, special document and included in approximating a relevance model, in addition to a ...

متن کامل

Enhancing Relevance Models with Adaptive Passage Retrieval

Passage retrieval and pseudo relevance feedback/query expansion have been reported as two effective means for improving document retrieval in literature. Relevance models, while improving retrieval in most cases, hurts performance on some heterogeneous collections. Previous research has shown that combining passage-level evidence with pseudo relevance feedback brings added benefits. In this pap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011